Introduction
Throughout this short course, we will be using the measurement and estimation of “social trust” as a guiding example. However, the aim is to help you build up your skills and confidence in asking and addressing research questions of your own, and the assessment questions will ask you to analyse some other chosen research topic.
Data transformation
Using the WVS7 dataset for the United Kingdom, and having identified the “social trust” variable from the Questionnaire and/or Codebook to be named “Q57” in the dataset, we obtain the following descriptive statistics:
- There appear to be 3 levels in the variables. Form the questionnaire we know that there should be 2 valid categories only: 1 (Most people can be trusted) and 2 (Need to be very careful);
- From the Codebook we know that there may be some other values coded by the researchers as well: -1 for “Don´t know”, -2 for “No answer”, -4 for “Not asked” and -5 for “Missing”. These are coded with negative values in order to stand out as non-standard categories of answers. They code various reasons for why values may be “missing” in the dataset. In the UK dataset we are using here we have three of these values present;
- By looking at the distribution plot, we see that the tallest bar (the largest answer category) is the one coded as 2. The largest (most common) category of a categorical variables is called the “mode”. We can also request for the “mode” the “median” and other potentially informative summary statistics to also be included in the Descriptive Statistics table by ticking the appropriate boxes under the Statistics option-bar in the Descriptive Statistics builder on the left hand side:
According to the Descriptive Statistics table the mean of the variable is 1.488. However, in this case this is not a sensible statistic. First of all, we have a categorical variable, so an average value between 1 (Most people can be trusted) and 2 (Need to be very careful) does not mean much. If anything, in the case of only two categories, we know that if category 1 and category 2 are of equal size (i.e. the same number of people have answered each of them), then the mean would be \({1 + 2 \over 2} = 1.5\), so a mean of 1.48 would indicate that the category coded as 1 is slightly larger. However, secondly, in our case we also have the several negative values that are also added up in the calculation of the mean. Under these circumstances, the mean cannot tell us anything accurate or useful.
To get a more precise percentage of the distribution of the answer options (categories) across the variable, we would need to request a Frequency table in the Descriptive Statistics builder, which we can do under the Tables option-bar:
The result would be:
We therefore find that almost 46% of the respondents in the dataset had answered that “Most people can be trusted”. However, this also counts with all the “missing” responses. If we would like to get a more accurate values for the “Valid Percent” (i.e. the percentage distribution among only those with valid response options, 1 or 2), then we will need to tell JASP to consider the negative values as “missing”.
If you see more than two levels/categories in your variable (and you are sure that you have chosen the correct variable), it means that in your dataset the variable contains some custom missing values. Missing values should be distinguished with a negative sign (e.g. -2). To set them as custom missing values, we can edit the dataset manually.
- Click on the Edit Data menu tab and scroll horizontally to your variable. Double-click on the variable column. In the middle tabs, check the label editor and identify any values that should be set as missing (e.g. -2). Click on the Missing values tab > Use custom values and in the narrow field to the right type in the value you want to set as missing and click the + sign. You should be seeing something like below:
You can check back in the Label editor that the redundant value is no longer there.
- In the same data editor window we can also make other changes to our variables. For example, let’s change the non-informative variable Name to something that better reflects the meaning of the variable, for example “soc-trust”, and let’s also give it a more descriptive Long name and even a Description if we want to. The Long name could be something like “Social trust”, and for the Description we could copy the original survey question out from the questionnaire: “Q57. Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people?”.
In the Label editor we should also attach labels to the values so that we can better read the outputs produced. We can copy the labels from the questionnaire here too: “1 = Most people can be trusted” and “2 = Need to be very careful”.
Your data editor window should look something like this:
Question
The variable measurement level (Column type) appears as Ordinal. Is that correct? If you feel that the variable type should be changed, you can also do that here using the drop-down menu next to Column type.
To return to your data analysis, you can click on the Analyses menu tab. If you have changed the variable’s name, you can move it back to the Variables field to see the updated outputs.
Find the other variables in the dataset that relate to social, interpersonal or institutional trust, and obtain similar descriptive statistics for them. Try to answer to yourself the same questions as above.
Add your notes to the Results output. By clicking on the small black down-arrowheads that appear next to the headings in the Results window when you hover over them with your mouse you can access small menu options that allow you to do various operations with the outputs (copy, save, edit, etc.), including the possibility to Add note. Your note will appear under the item to which it is added, and you can type in your note. The field acts as a basic text editor, which you can use to jot down your interpretation of the results and keep them close to the output. You can add in here your answers to some of the questions above.
Your results and notes could look something like this (based on the WVS7 dataset for Andorra):
Save your analysis. Click through Hamburger menu tab > Save as > Computer > Browse and save the analysis in your “WVS7” folder with the name “wvs7-example.jasp” (or anything else that you find useful). Once it’s saved, you can close the analysis. You can now open the
.jaspfile you saved and continue or modify the analysis you have started.Now open a new JASP session and import the “ESS10” dataset you downloaded earlier, and perform a similar descriptive analysis on variables related to social, interpersonal and institutional trust you have done above. When complete, save that analysis to a
.jaspfile too.
Exercise 4: Begin your analysis for Assignment 1!
Below are some research questions that you can choose from to address in Assignment 1:
- Are religious people more satisfied with life?
- Are older people more likely to see the death penalty as justifiable?
- What factors are associated with opinions about future European Union enlargement among Europeans?
- Is higher internet use associated with stronger anti-immigrant sentiments?
- How does victimisation relate to trust in the police?
- What factors are associated with belief in life after death?
- Are government/public sector employees more inclined to perceive higher levels of corruption than those working in the private sector?
For now, choose one question that you find most sympathetic (you don’t need to stick with it for the assignment, but you could if you wanted to!). All of the questions can be answered with at least one of the survey datasets that you downloaded (the “WVS7” or “ESS10”) and often they both contain relevant variables.
Identify your “explanandum” - i.e. the core phenomenon/concept/behaviour/etc. that the research question aims to explain. The questions all postulate a relationship/association between two or more variables (the topic of the next workshop), but for now, think carefully about the question and how it is formulated, and identify which is the variable that will be the target of explanation, and which variable (if mentioned) will be used for explaining it. For example, in the research question “Does education increase social trust?”, the variable we are interested in explaining is “social trust”, while “education” is the variable that we will use to explain it. In later workshops we will develop better vocabulary to describe associations between variables.
Once the core phenomenon to be explained is identified, look through the two survey questionnaires to identify any variables that might exist in the dataset that captures it. This may require some trial-and-error with testing out search words.
Once you have found one (or several) candidate variable(s), navigate to the relevant survey website and select a single country for which to download data. You will be working with single-country datasets for your assignment. Download the dataset, import it into JASP, find the relevant variable and perform some descriptive analysis on the chosen variable as you have done in the previous exercise.
Make sure to add your noted and interpretations on the analysis results and save your analysis for later. You could create a new sub-folder for your “Assignment 1” work and save your analysis there for future use. If you end up liking your chosen question, you can continue this analysis in the next workshop.